Every picture tells a story … don’t it?

Sam Castillo and Brian A. Fannin

September 19, 2019

Where to find this

This presentation may be found at: https://pirategrunt.com/soa_symposium_2019/

Code to produce the examples and slides: https://github.com/PirateGrunt/soa_symposium_2019

What we’ll talk about

  • Communication efficiency
  • Practical advice for data visualization

Communication efficiency

9

Which of these two numbers is larger?

11

9

How about these two?

1011

1001

These?

IX

XI

And these?

9

B

These?

十一

These?

11

9

How about these?

For statisticians there always have to be comparisons; numbers on their own are not enough.

  • Gelman and Unwin

These two?

999999999

99999999999

These two?

999,999,999

99,999,999,999

nine

9

neun

1001

IX

nueve

Arabic or sanskrit are no more legitimate than any other representation of numbers.

Be prepared to accept the idea that there are circumstances when geometric primitives may be understood faster.

This is actually too much information

This is better

Statistics maps a set of many numbers into a set of fewer numbers.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3534    7854    9562    9973   11644   24476

metric x
Min. 3,534
1st Qu. 7,854
Median 9,562
Mean 9,973
3rd Qu. 11,644
Max. 24,476

metric x
Min. 3,534
1st Qu. 7,854
Median 9,562
Mean 9,973
3rd Qu. 11,644
Max. 24,476

Summary statistics are always a reduction of information.

Visualization presents (almost) all of the data. The reductions are made with our eyes.

Potential complaint

Focus on visual design places undue emphasis on superfluous characteristics like color, font, etc.

What’s the most important word in the text which follows?

The rate for territory X must be increased by 10.4%.

And this one?

The rate for territory X must be increased by 10.4%.

“Inessential” matters. Emphasis is an element of communication and therefore comprehension.

Are you ready to buy stock in this company?

This year we plan top build on last year’s renewed profitability.

Practical Advice for Data Visualization

A real life example: improving my diet

Speed test: which is healthier, left or right?

Limations of the nutrition label

  • Serving sizes are inconsistent
  • Converting units requires a calculator
  • Each food has it’s own label
  • Data collection is slow

Instacart orders

My daily calories for the last four months

Setting a daily benchmark

Measure Value
Calories 3000
Protein (g) 125
Total Fat (g) 150
Saturated Fat (g) 0
Trans Fat (g) 0
Cholesterol (mg) 600
Sodium (mg) 200
Carbs (g) 600
Dietary Fiber (g) 25
Sugars (g) 40

Percent of daily benchmark

Percent of daily benchmark

Exploration

  • Speed
  • Iteration
  • Agility
  • Many dimensions
  • Tidy data
  • R and Python

Communication

  • Simplicity
  • Professionalism
  • Consistency
  • PowerPoint, Tableau, PowerBI, D3.js

Visualization is a process

R for Data Science, Hadley Wickham: https://r4ds.had.co.nz/

Default template

With a custom template

Do Not Repeat Yourself (DRY)

Import then transform then visualize then add a custom theme

The best graphs are easy to read

  • What are the y-axis “density” units?
  • How can this be translated into english?

The best graphs are easy to read

  • In english: “There are just over 2,000 hospitals with between 30 and 50 readmissions”

Graphs should be unambiguous

A step in the right direction

Showing uncertainty

Color wheels are helpful

##  [1] "#0000FF" "#8000FF" "#FF00FF" "#FF0080" "#FF0000" "#FF7F00" "#FFFF00"
##  [8] "#80FF00" "#00FF00" "#00FF7F" "#00FFFF" "#0080FF"

Complimentary colors are opposites on the color wheel

## [1] "#4682B4" "#B47846"

Use the colors specific to your brand

Company website

R colors

Emotional intelligence matters

How to Win Friends and Influence People - Dale Carnegie

  1. Realize that you can’t ‘win’ an arguement
  2. Let the other person see the pattern and come to the conclusion themselves
  3. Put yourself in the other person’s shoes
  4. Use familiar language
  5. Be friendly
  6. Get the other person saying “yes, yes” immediately

Three-point summary

  1. For exploration, focus on speed, iteration, and flexibility
  2. For communication, focus on professionalism, simplicity, and consistency
  3. Realize that data viz is just like any other mode of communication such as speech, text, and body language

Thank you!

Where to find this

This presentation may be found at: https://pirategrunt.com/soa_symposium_2019/

Code to produce the examples and slides: https://github.com/PirateGrunt/soa_symposium_2019